efg, 2017-09-02

Setup

time.1 <- Sys.time()

Required packages

library(MASS)          # fgl data
library(caret)         # preProcess, predict
library(dplyr)         # select
library(rgl)           # par3d, plot3d, movie3d, rglwidget
library(RColorBrewer)  # brewer.pal

magick from ImageMagick must be installed to created the animated GIF of the PCA.

Forensic Glass Data

rawData <- fgl
typeColorIndex <- as.integer(rawData$type)
rawData  <- rawData %>% select(-type) 

Principal Component Analysis Using Caret

Let’s display the first 3 principal components in a 3D scatterplot

nPCAcomponents <- 3
transformSetup <- preProcess(rawData, method=c("center", "scale", "pca"), pcaComp=nPCAcomponents)
pcaScores <- predict(transformSetup, rawData)
head(pcaScores)
         PC1        PC2        PC3
1 -1.1484468 -0.5282491  0.3712253
2  0.5727942 -0.7580105  0.5554059
3  0.9379605 -0.9276609  0.5536094
4  0.1417509 -0.9594279  0.1168507
5  0.3502710 -1.0886966  0.4839440
6  0.2895876 -1.3209105 -0.8666466

You can verify preProcess gives the same PCAscores as in the SVD notebook.

Interactive 3D scatterplot of first 3 principal components

Project 9-dimensional data onto 3 dimensions for display.

The first 3 PCs account for about 66% of variance in data.

typeColors <- brewer.pal(length(levels(fgl$type)), "Dark2")   
par3d("windowRect"=c(50,50,800,800))
plot3d(x=pcaScores$PC1, y=pcaScores$PC2, z=pcaScores$PC3, 
       col=typeColors[typeColorIndex],
  xlab="PC1", ylab="PC2", zlab="PC3", type="s", size= 3)
rglwidget(elementId="FGL1")

Chrome browser works best to display above figure.

Drag mouse over figure to rotate. Use mouse wheel to zoom in and out.

Legend

x <- barplot(rep(1,6), yaxt="n", col=typeColors)
text(x, 0.5, levels(fgl$type))

Automatically rotate for about 15 seconds when created.

Note the “Home” instances form a fairly good cluster, but the other types not so much.

play3d(spin3d(), duration=15)

Animated GIF

Create the animated GIF movie using magick from ImageMagick – this takes some time. Display below using HTML.

150 PNG images will be computed for 15 sec duration * 10 frames/second.

movie3d(spin3d(), duration = 15, dir = getwd(),
        movie="ForensicGlass",
        verbose=FALSE, convert="magick -delay 1x%d %s*.png %s.%s")

Here’s the HTML needed in the R Markdown document to embed the GIF into the HTML file created with knitr.

<div id="PCA">
  <img src="ForensicGlass.gif" alt="">
</div>

Processing time: 54.4 sec

2017-09-07 22:09:50

References

Practical Guide to Principal Component Analysis (PCA) in R & Python from Analytics Vidhya, 2016.

Computing and visualizing PCA in R by Thiago G. Martins, 2013.

Introduction to Principal Component Analysis (PCA) by Thiago G. Martins, 2013.

Principal Components Analysis notes from class given by Brian Junker and Cosma Shalizi at CMU, 2010.

Principal Components Analysis: A How-To Manual for R by Emily Mankin. Includes